Arm-Acquiring Bandits
نویسندگان
چکیده
منابع مشابه
Best Arm Identification for Contaminated Bandits
This paper studies active learning in the context of robust statistics. Specifically, we propose the Contaminated Best Arm Identification variant of the multi-armed bandit problem, in which every arm pull has probability ε of generating a sample from an arbitrary contamination distribution instead of the true underlying distribution. The goal is to identify the best (or approximately best) true...
متن کاملBest-Arm Identification in Linear Bandits
We study the best-arm identification problem in linear bandit, where the rewards of the arms depend linearly on an unknown parameter θ and the objective is to return the arm with the largest reward. We characterize the complexity of the problem and introduce sample allocation strategies that pull arms to identify the best arm with a fixed confidence, while minimizing the sample budget. In parti...
متن کاملA Parallel Program for 3-arm Bandits
We describe a new parallel program for optimizing and analyzing 3-arm Bernoulli bandit problems. Previous researchers had considered this problem computationally intractable, and we know of no previous exact optimizations of 3-arm bandit problems. Despite this, our program is able to solve problems of size 100 or more. We describe the techniques used to achieve this, and indicate various extens...
متن کاملBest Arm Identification in Multi-Armed Bandits
We consider the problem of finding the best arm in a stochastic multi-armed bandit game. The regret of a forecaster is here defined by the gap between the mean reward of the optimal arm and the mean reward of the ultimately chosen arm. We propose a highly exploring UCB policy and a new algorithm based on successive rejects. We show that these algorithms are essentially optimal since their regre...
متن کاملOn Robust Arm-Acquiring Bandit Problems
In the classical multi-armed bandit problem, at each stage, the player has to choose one from N given projects (arms) to generate a reward depending on the arm played and its current state. The state process of each arm is modeled by a Markov chain and the transition probability is priorly known. The goal of the player is to maximize the expected total reward. One variant of the problem, the so...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: The Annals of Probability
سال: 1981
ISSN: 0091-1798
DOI: 10.1214/aop/1176994469